134 research outputs found

    Beyond Data and Quality – Unleashing the Value of Citizen Contributions

    Get PDF
    Citizen Science projects generate outcomes that have the potential to be highly valuable both for science and society. Those contributions are not restricted to data, but can be other kinds of results, e.g. best practices, new insights or research questions. Letting Citizen Science have a true impact requires us to unleash the value of citizen contributions. This touches a wide range of points and goes beyond mere data quality considerations. Challenges include, but are not restricted to • How cannot just data, but other outcomes of Citizen Science projects be captured and made accessible to others? • How to make sure that citizen contributions are findable, accessible and interpretable by others and how to enable interoperability with other data and knowledge sources (i.e. adhere to the FAIR-Principles)? • How to assess and improve the quality and reliability of Citizen Science outcomes and increase their credibility? • How to make sure that data collected by citizen scientists are useful and relevant for addressing scientific questions? • How to enable citizen scientists to gain and apply insights from project outcomes? This talk is meant as an impulse for further discussion and scholarly exchange on these topics to foster a systematic approach to address these challenges

    Effective decision support for semantic web service selection

    Get PDF
    The objective of this dissertation is to demonstrate the feasibility of the vision of the Internet of Services based on Semantic Web Services by suggesting an approach to end-user mediated Semantic Web Service selection. Our main contribution is an incremental and interactive approach to requirements elicitation and service selection that is inspired by example critiquing recommender systems. It alternates phases of intermediate service recommendation and phases of informal requirements specification. During that process, the user incrementally develops his service requirements and preferences and finally makes a selection decision. We demonstrate how the requirements elicitation and service selection process can be directed and focused to effectively reduce the system's uncertainty about the user's service requirements and thus to contribute to the efficiency of the service selection process. To acquire information about the actual performance of available services and thus about the risk that is associated with their execution, we propose a flexible feedback system, that leverages reported consumer experiences made in past service interactions. In particular, we provide means that allow to detailedly describe a service's performance with respect to its multiple facets. This is supplemented by a user-adaptive method that effectively assists service consumers in providing such feedback as well as a privacy-preserving technique for feedback propagation. We also demonstrate that available consumer feedback can be effectively exploited to assess the degree and kind of risk that is associated with the execution of an offered service and show how the user can be effectively made aware of this risk. In contrast to many other approaches related to Semantic Web Service technology, we performed an extensive and thorough evaluation of our contribution and documented its results. These show the effectiveness and efficiency of our approach

    Open Data Kit Goes Semantic - A Contribution to the Interpretability and Interoperability of Citizen Science Data

    Get PDF
    In den Bürgerwissenschaften spielt die Datenerfassung mittels mobiler Anwendungen eine immer wichtigere Rolle. In den letzten Jahren sind dazu eine ganze Reihe von Software-Frameworks entstanden, welche die einfache Erstellung von Umfragen und das Sammeln von Daten über Smartphone-Anwendungen ermöglichen. Diese Frameworks unterstützen zwar die Erstellung und Durchführung solcher Datenerfassungsumfragen, der Datenexport beschränkt sich jedoch meist auf tabellarische Standardformate wie CSV oder Excel. Da neben den eigentlichen Daten nur wenige Metadaten erhoben werden, bleibt zudem oft die Semantik der Daten (Was wurde wie gemessen/beobachtet?) unerfasst. Dies erschwert die Nachnutzung dieser Citizen-Science-Daten über den initialen Kontext hinaus, da Interpretierbarkeit und Integration mit anderen Daten beeinträchtigt werden. Unser Beitrag stellt eine Methode und zugehörige Implementierung vor, welches es Forschern ermöglicht ihre Umfragen für Kampagnen einfach semantisch anzureichern und die erhobenen Daten flexibel zu exportieren. Neben klassischen Formaten wie XML, wird auch der Export nach RDF (Linked Open Data) unterstützt, was die Verknüpfung der Daten mit ihrer maschinenlesbaren Bedeutung ermöglicht. Die Implementierung wurde als Erweiterung des weit verbreiteten Datenerfassungs-Frameworks Open Data Kit 1 (ODK1) realisiert und ist frei verfügbar. Damit leistet sie einen Beitrag zur Interoperabilität und Interpretierbarkeit von Citizen-Science-Daten

    What happens where during disasters? A Workflow for the multifaceted characterization of crisis events based on Twitter data

    Get PDF
    Twitter data are a valuable source of information for rescue and helping activities in case of natural disasters and technical accidents. Several methods for disaster- and event-related tweet filtering and classification are available to analyse social media streams. Rather than processing single tweets, taking into account space and time is likely to reveal even more insights regarding local event dynamics and impacts on population and environment. This study focuses on the design and evaluation of a generic workflow for Twitter data analysis that leverages that additional information to characterize crisis events more comprehensively. The workflow covers data acquisition, analysis and visualization, and aims at the provision of a multifaceted and detailed picture of events that happen in affected areas. This is approached by utilizing agile and flexible analysis methods providing different and complementary views on the data. Utilizing state‐of‐the‐art deep learning and clustering methods, we are interested in the question, whether our workflow is suitable to reconstruct and picture the course of events during major natural disasters from Twitter data. Experimental results obtained with a data set acquired during hurricane Florence in September 2018 demonstrate the effectiveness of the applied methods but also indicate further interesting research questions and directions

    Does Term Expansion Matter for the Retrieval of Biodiversity Data?

    Get PDF
    ABSTRACT While term expansion techniques are well investigated for many domains, semantic enrichment of keyword queries for the retrieval of scientific datasets is still paid little attention to. In particular, a systematic analysis of which kind of semantically related concepts lead to the most relevant results is missing. Based on query expansion techniques, we semantically enriched search queries provided by biodiversity researchers to answer specific research questions. We applied them to a system indexing over 92,856 biological metadata files harvested from GFBio -the German Federation for Biological Data. We compared the outcome with the original keyword-based query. The result reveals that enriched keywords deliver a larger number of relevant datasets and that datasets retrieved based on keywords and their synonyms were judged more relevant. Query expansion with other related concepts returned a mixed picture

    Gaussian Processes for One-class and Binary Classification of Crisis-related Tweets

    Get PDF
    The Twitter Stream API offers the possibility to develop (near) real-time methods and applications to detect and monitor impacts of crisis events and their changes over time. As demonstrated by various related research, the content of individual tweets or even entire thematic trends can be utilized to support disaster management, fill information gaps and augment results of satellite-based workflows as well as to extend and improve disaster management databases. Considering the sheer volume of incoming tweets, it is necessary to automatically identify the small number of crisis-relevant tweets and present them in a manageable way. Current approaches for identifying crisis-related content focus on the use of supervised models that decide on the relevance of each tweet individually. Although supervised models can efficiently process the high number of incoming tweets, they have to be extensively pre-trained. Furthermore, the models do not capture the history of already processed messages. During a crisis, various and unique sub-events can occur that are likely to be not covered by the respective supervised model and its training data. Unsupervised learning offers both, to take into account tweets from the past, and a higher adaptive capability, which in turn allows a customization to the specific needs of different disasters. From a practical point of view, drawbacks of unsupervised methods are the higher computational costs and the potential need of user interaction for result interpretation. In order to enhance the limited generalization capabilities of pre-trained models as well as to speed up and guide unsupervised learning, we propose a combination of both concepts. A successive clustering of incoming tweets allows to semantically aggregate the stream data, whereas pre-trained models allow to identify potentially crisis-relevant clusters. Besides the identification of potentially crisis-related content based on semantically aggregated clusters, this approach offers a sound foundation for visualizations, and further related tasks, like event detection as well as the extraction of detailed information about the temporal or spatial development of events. Our work focuses on analyzing the entire freely available Twitter stream by combining an interval-based semantic clustering with an supervised machine learning model for identifying crisis-related messages. The stream is divided into intervals, e.g. of one hour, and each tweet is projected into a numerical vector by using state-of-the-art sentence embeddings. The embeddings are then grouped by a parametric Chinese Restaurant Process clustering. At the end of each interval, a pre-trained feed-forward neural network decides whether a cluster contains crisis-related tweets. With a further developed concept of cluster chains and central centroids, crisis-related clusters of different intervals can be linked in a topic- and even subtopic-related manner. Initial results show that the hybrid approach can significantly improve the results of pre-trained supervised methods. This is especially true for categories in which the supervised model could not be sufficiently pre-trained due to missing labels. In addition, the semantic clustering of tweets offers a flexible and customizable procedure, resulting in a practical summary of topic-specific stream content

    Towards an Interactive Approach for Ontology Recommendation and Reuse

    Get PDF
    Ontologies are machine-comprehensible and reusable pieces of knowledge designed to explicitly define the semantics of an application domain, using a set of concepts, properties that relate concepts to each other or to literals, and a set of individuals. When deciding to develop an ontology for a new application domain, ontology engineers face the question whether to reuse existing ontologies or to build a new ontology from scratch. In conceptually diverse domains, such as biodiversity, building an ontology from scratch is an expensive and time-consuming process. In such a case, it is a better choice to reuse existing ontologies or parts of them. In general, ontology reuse is defined as the process where existing ontologies, along with possibly other non-ontological resources, are determined and used for building new integrated ontologies as part of a knowledge base. A case study on ontology reuse in different domains, that we conducted, revealed that ontology reuse is either done manually or semi-automatically with IT-support mainly focusing on the retrieval and recommendation of existing ontologies based on their conceptual coverage. This contrasts with the fact that manual ontology engineering and reuse, especially in complex domains, requires great efforts from both ontology engineers and domain experts. Moreover, the ontology reuse process is inherently incremental, as an ontology is developed step-by-step and evolves over time. This aspect is not considered by existing tools that typically make one-shot recommendations. In our talk, we present the concept of a tool which supports interactive ontology recommendation and reuse in order to assist ontology engineers and domain experts in their task to generate an ontological knowledgebase for a specific application domain. The tool will have the following features: a) it allows the user to specify a (potentially empty) seed ontology as a starting point for the new ontology, b) based on a set of candidate ontologies and textual input describing the specified domain, it identifies, extracts and recommends pieces of the candidate ontologies (properties, concepts, textual and formal specifications of concepts) that might be used to extend the seed ontology in an interactive and iterative process, the user selects recommended pieces, which are automatically integrated with the seed ontology. The system makes sure that the resulting ontology is consistent and complies with the domain semantics intended by the user. This will be achieved by the use of logical reasoning and the provisioning of explanations and proper visualizations. KEYWORDS: Ontology engineering, Interactive Ontology Recommendation, Ontology Reuse ACKNOWLEDGEMENT: This work is partly funded by the DAAD project “BioDialog”. We are particularly grateful to Prof. Birgitta König-Ries and Dr. Alsayed Algergaw

    Combining Supervised and Unsupervised Learning to Detect and Semantically Aggregate Crisis-Related Twitter Content

    Get PDF
    Twitter is an immediate and almost ubiquitous platform and therefore can be a valuable source of information during disasters. Current methods for identifying and classifying crisis-related content are often based on single tweets, i.e., already known information from the past is neglected. In this paper, the combination of tweet-wise pre-trained neural networks and unsupervised semantic clustering is proposed and investigated. The intention is to (1) enhance the generalization capability of pre-trained models, (2) to be able to handle massive amounts of stream data, (3) to reduce information overload by identifying potentially crisis-related content, and (4) to obtain a semantically aggregated data representation that allows for further automated, manual and visual analyses. Latent representations of each tweet based on pre-trained sentence embedding models are used for both, clustering and tweet classification. For a fast, robust and time-continuous processing, subsequent time periods are clustered individually according to a Chinese restaurant process. Clusters without any tweet classified as crisis-related are pruned. Data aggregation over time is ensured by merging semantically similar clusters. A comparison of our hybrid method to a similar clustering approach, as well as first quantitative and qualitative results from experiments with two different labeled data sets demonstrate the great potential for crisis-related Twitter stream analyses

    Review article: Detection of actionable tweets in crisis events

    Get PDF
    Messages on social media can be an important source of information during crisis situations. They can frequently provide details about developments much faster than traditional sources (e.g., official news) and can offer personal perspectives on events, such as opinions or specific needs. In the future, these messages can also serve to assess disaster risks. One challenge for utilizing social media in crisis situations is the reliable detection of relevant messages in a flood of data. Researchers have started to look into this problem in recent years, beginning with crowdsourced methods. Lately, approaches have shifted towards an automatic analysis of messages. A major stumbling block here is the question of exactly what messages are considered relevant or informative, as this is dependent on the specific usage scenario and the role of the user in this scenario. In this review article, we present methods for the automatic detection of crisis-related messages (tweets) on Twitter. We start by showing the varying definitions of importance and relevance relating to disasters, leading into the concept of use case-dependent actionability that has recently become more popular and is the focal point of the review paper. This is followed by an overview of existing crisis-related social media data sets for evaluation and training purposes. We then compare approaches for solving the detection problem based (1) on filtering by characteristics like keywords and location, (2) on crowdsourcing, and (3) on machine learning technique. We analyze their suitability and limitations of the approaches with regards to actionability. We then point out particular challenges, such as the linguistic issues concerning social media data. Finally, we suggest future avenues of research and show connections to related tasks, such as the subsequent semantic classification of tweets

    Dataset Search in Biodiversity Research: Do Metadata in Data Repositories Reflect Scholarly Information Needs?

    Get PDF
    Abstract The increasing amount of publicly available research data provides the opportunity to link and integrate data in order to create and prove novel hypotheses, to repeat experiments or to compare recent data to data collected at a different time or place. However, recent studies have shown that retrieving relevant data for data reuse is a time-consuming task in daily research practice. In this study, we explore what hampers dataset retrieval in biodiversity research, a field that produces a large amount of heterogeneous data. In particular, we focus on scholarly search interests and metadata, the primary source of data in a dataset retrieval system. We show that existing metadata currently poorly reflect information needs and therefore are the biggest obstacle in retrieving relevant data. Our findings indicate that for data seekers in the biodiversity domain environments, materials and chemicals, species, biological and chemical processes, locations, data parameters and data types are important information categories. These interests are well covered in metadata elements of domain-specific standards. However, instead of utilizing these standards, large data repositories tend to use metadata standards with domain-independent metadata fields that cover search interests only to some extent. A second problem are arbitrary keywords utilized in descriptive fields such as title, description or subject. Keywords support scholars in a full text search only if the provided terms syntactically match or their semantic relationship to terms used in a user query is known
    corecore